Selecting Near-Optimal Learners via Incremental Data Allocation
نویسندگان
چکیده
We study a novel machine learning (ML) problem setting of sequentially allocating smallsubsets of training data amongst a large set of classifiers. The goal is to select a classifierthat will give near-optimal accuracy when trained on all data, while also minimizing the costof misallocated samples. This is motivated by large modern datasets and ML toolkits withmany combinations of learning algorithms and hyper-parameters. Inspired by the principle of“optimism under uncertainty,” we propose an innovative strategy, Data Allocation using UpperBounds (DAUB), which robustly achieves these objectives across a variety of real-world datasets.We further develop substantial theoretical support for DAUB in an idealized setting wherethe expected accuracy of a classifier trained on n samples can be known exactly. Under theseconditions we establish a rigorous sub-linear bound on the regret of the approach (in termsof misallocated data), as well as a rigorous bound on suboptimality of the selected classifier.Our accuracy estimates using real-world datasets only entail mild violations of the theoreticalscenario, suggesting that the practical behavior of DAUB is likely to approach the idealizedbehavior.
منابع مشابه
Cooperative Negotiation in Autonomic Systems using Incremental Utility Elicitation
Decentralized resource allocation is a key problem for large-scale autonomic (or self-managing) computing systems. Motivated by a data center scenario, we explore efficient techniques for resolving resource conflicts via cooperative negotiation. Rather than computing in advance the functional dependence of each element’s utility upon the amount of resource it receives, which could be prohibitiv...
متن کاملThe Effects of Technical and Organizational Activities on Redundancy Allocation Problem with Choice of Selecting Redundancy Strategies Using the memetic algorithm
Redundancy allocation problem is one of most important problems in reliability area. This problem involves with the suitable redundancy levels under certain strategies to maximizing system reliability under some constraints. Many changes have been made on this problem to draw the problem near to real situations. Selecting the redundancy strategy, using different system configuration are some of...
متن کاملL2 Writing Feedback Preferences and Their Relationships with Entity vs. Incremental Mindsets of EFL Learners
The present study was aimed at investigating intermediate Iranian EFL learners’ feedback preferences on their L2 writing and examining the possible differences between learners with entity and incremental language mindsets with respect to their feedback preferences. To this end, 150 EFL learners were recruited from several language institutes in Isfahan, Iran, and their language proficiency lev...
متن کاملNear-Optimal Bayesian Ambiguity Sets for Distributionally Robust Optimization
We propose a Bayesian framework for assessing the relative strengths of data-driven ambiguity sets in distributionally robust optimization (DRO) when the underlying distribution is defined by a finite-dimensional parameter. The key idea is to measure the relative size between a candidate ambiguity set and a specific, asymptotically optimal set. This asymptotically optimal set is provably the sm...
متن کاملNear-Optimal Bayesian Ambiguity Sets for 3 Distributionally Robust Optimization 4
We propose a Bayesian framework for assessing the relative strengths of data-driven ambiguity sets in distributionally robust optimization (DRO) when the underlying distribution is defined by a finite-dimensional parameter. The key idea is to measure the relative size between a candidate ambiguity set and a specific asymptotically optimal set. As the amount of data grows large, this asymptotica...
متن کامل